feat(scanner): Issue 3 - False-positive noise filter 구현#14
Conversation
Resolves #3. Co-Authored-By: Codex GPT-5 <noreply@openai.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a parser-level noise filter for Gitleaks findings to filter out low-signal candidates (such as template placeholders, known dummy values, repeated characters, and low-entropy short strings) before storage or verification. This feature is controlled by a new configuration option, enable_noise_filter (defaulting to True), which is integrated into the scan options and manifest parsing. The review feedback suggests optimizing regex matching performance by merging multiple individual patterns for template placeholders and false-negative prevention into single patterns using alternation. Additionally, the reviewer recommends adding defensive type checks for the Secret field to prevent potential runtime errors if the parsed value is not a string.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Resolve PR review feedback for the Gitleaks noise filter by combining repeated regex checks into single compiled patterns and handling non-string Secret values defensively. Co-Authored-By: Codex GPT-5 <noreply@openai.com>
Purpose & Motivation
Resolves #3.
Gitleaks가 탐지하는 결과 중 템플릿 placeholder, 더미 값, 반복 문자, 저엔트로피 문자열 등 명백한 false-positive 노이즈를 LLM verifier 이전에 필터링합니다. 목적은 storage/write volume과 verifier 비용을 줄이고, report/evaluate 단계의 신호 품질을 높이는 것입니다.
Context
docs/views/research-and-technical-decisions.md에 parser 단계 noise filter 결정을 문서화했습니다.src/security_scanner/scanners/gitleaks/filter.py에 noise classifier를 추가했습니다.parse_gitleaks_report()에서map_gitleaks_item()호출 전 raw Gitleaks item을 필터링합니다.ScanOptions.enable_noise_filter를 추가하고 manifestscan.enable_noise_filter에서 제어할 수 있게 했습니다.GitleaksScanner.scan()이scan_options를 parser까지 전달하도록 연결했습니다.Note
리뷰 시 특히 아래를 봐주세요.
enable_noise_filter=False경로가 parser/scanner/manifest에서 모두 작동하는지검증:
uv run pytest→366 passedgit diff --check→ cleancr review --agent -t committed --base origin/main→ minor test coverage finding 반영 완료Dependency
Checklist